LTAG-spinal treebank and parser for Hindi

نویسندگان

  • Prashanth Mannem
  • Aswarth Abhilash
  • Akshar Bharati
چکیده

Statistical parsers need huge annotated treebanks to learn from and building treebanks is an expensive proposition. To create parsers for different grammar formalisms in a language, building separate treebanks for each of those isn’t a feasible task. Treebanks available in one formalism can be converted into an other either automatically or with minimal human effort by exploiting the similarities and differences between the two. In this work, we present an approach to extract an LTAGspinal treebank from Hyderabad Dependency Treebank for Hindi. LTAG-spinal is a variant of Lexicalized Tree Adjoining Grammar (LTAG) with desirable linguistic, computational and statistical properties. A bidirectional LTAG dependency parser is trained on the extracted treebank and an LTAG dependency accuracy of 80.86% is reported.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing

Abstract. We introduce LTAG-spinal, a novel variant of traditional Lexicalized Tree Adjoining Grammar (LTAG) with desirable linguistic, computational and statistical properties. Unlike in traditional LTAG, subcategorization frames and the argument-adjunct distinction are left underspecified in LTAG-spinal. LTAG-spinal with adjunction constraints is weakly equivalent to LTAG. The LTAG-spinal for...

متن کامل

Statistical Ltag Parsing

STATISTICAL LTAG PARSING Libin Shen Aravind K. Joshi In this work, we apply statistical learning algorithms to Lexicalized Tree Adjoining Grammar (LTAG) parsing, as an effort toward statistical analysis over deep structures. LTAG parsing is a well known hard problem. Statistical methods successfully applied to LTAG parsing could also be used in many other structure prediction problems in NLP. F...

متن کامل

Statistical Morphological Tagging and Parsing of Korean with an LTAG Grammar

This paper describes a lexicalized tree adjoining grammar (LTAG) based parsing system for Korean which combines corpus-based morphological analysis and tagging with a statistical parser. Part of the challenge of statistical parsing for Korean comes from the fact that Korean has free word order and a complex morphological system. The parser uses an LTAG grammar which is automatically extracted u...

متن کامل

Exploration of the LTAG-Spinal Formalism and Treebank for Semantic Role Labeling

LTAG-spinal is a novel variant of traditional Lexicalized Tree Adjoining Grammar (LTAG) introduced by (Shen, 2006). The LTAG-spinal Treebank (Shen et al., 2008) combines elementary trees extracted from the Penn Treebank with Propbank annotation. In this paper, we present a semantic role labeling (SRL) system based on this new resource and provide an experimental comparison with CCGBank and a st...

متن کامل

Bidirectional Dependency Parser for Hindi, Telugu and Bangla

This paper describes the dependency parser we used in the NLP Tools Contest, 2009 for parsing Hindi, Bangla and Telugu. The parser uses a bidirectional parsing algorithm with two operations proj and non-proj to build the dependency tree. The parser obtained Labeled Attachment Score of 71.63%, 59.86% and 67.74% for Hindi, Telugu and Bangla respectively on the treebank with fine-grained dependenc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009